Problem Note 53937: Smoothing parameter might be incorrect when BY statement is specified in PROC LOESS
If a BY statement is specified in PROC LOESS and any BY group has a constant response (and thereby Smoothing Parameter S=1), then S is set to 1 in all subsequent BY groups, regardless of what the correct value should be. See the code in the Full Code tab for an example.
The only workaround for this problem is to run BY groups that have a constant response separately.
Operating System and Release Information
SAS System | SAS/STAT | Solaris for x64 | 9.2 TS1M0 | 9.4 TS1M3 |
OpenVMS on HP Integrity | 9.2 TS1M0 | 9.4 TS1M3 |
Linux | 9.2 TS1M0 | 9.4 TS1M3 |
Linux for x64 | 9.2 TS1M0 | 9.4 TS1M3 |
HP-UX IPF | 9.2 TS1M0 | 9.4 TS1M3 |
64-bit Enabled Solaris | 9.2 TS1M0 | 9.4 TS1M3 |
64-bit Enabled HP-UX | 9.2 TS1M0 | 9.4 TS1M3 |
64-bit Enabled AIX | 9.2 TS1M0 | 9.4 TS1M3 |
Windows Vista for x64 | 9.2 TS1M0 | |
Windows Vista | 9.2 TS1M0 | |
Microsoft Windows XP Professional | 9.2 TS1M0 | |
Microsoft Windows Server 2003 Standard Edition | 9.2 TS1M0 | |
Microsoft Windows Server 2003 Enterprise Edition | 9.2 TS1M0 | |
Microsoft Windows Server 2003 Datacenter Edition | 9.2 TS1M0 | |
Microsoft® Windows® for x64 | 9.2 TS1M0 | 9.4 TS1M3 |
Microsoft Windows XP 64-bit Edition | 9.2 TS1M0 | |
Microsoft Windows Server 2003 Enterprise 64-bit Edition | 9.2 TS1M0 | |
Microsoft Windows Server 2003 Datacenter 64-bit Edition | 9.2 TS1M0 | |
Microsoft® Windows® for 64-Bit Itanium-based Systems | 9.2 TS1M0 | |
z/OS | 9.2 TS1M0 | 9.4 TS1M3 |
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.
Below is a test program to illustrate the issue described in this Problem Note. Four BY groups are formed using a modification of the Melanoma data set from the PROC LOESS Getting Started Example "Scatter Plot Smoothing". The response for Group 2 is artificially set to a constant. PROC LOESS correctly gives a Smoothing Parameter S=1 for BY Group=2. Then S is also set to 1 for all subsequent BY Groups even if it is not the correct value.
/* Melanoma incidence data for 1936 to 1979. The data are put into 4 groups.
Group 2 is artificially given a constant incidence rate. */
data Melanoma2;
input Year Incidences @@;
format Year d4.0;
if year < 1950 then group=1;
else if year < 1960 then group=2;
else if year < 1970 then group=3;
else group=4;
if group=2 then Incidences=3.2;
datalines;
1936 0.9 1937 0.8 1938 0.8 1939 1.3
1940 1.4 1941 1.2 1942 1.7 1943 1.8
1944 1.6 1945 1.5 1946 1.5 1947 2.0
1948 2.5 1949 2.7 1950 2.9 1951 2.5
1952 3.1 1953 2.4 1954 2.2 1955 2.9
1956 2.5 1957 2.6 1958 3.2 1959 3.8
1960 4.2 1961 3.9 1962 3.7 1963 3.3
1964 3.7 1965 3.9 1966 4.1 1967 3.8
1968 4.7 1969 4.4 1970 4.8 1971 4.8
1972 4.8 1973 4.8 1974 4.9 1975 5.0
1976 4.9 1977 5.8 1978 5.1 1979 5.0
;
title "Constant response in Group 2";
proc sgplot data=melanoma2;
scatter x=year y=Incidences / group=group;
run;
/* PROC LOESS on all 4 BY groups */
title 'A constant response in BY Group=2 causes all subsequent';
title2 'BY groups to have S=1, even when this is not the correct value';
proc loess data=Melanoma2;
ods select FitSummary;
by group;
model Incidences=Year;
run;
/* PROC LOESS on BY groups 3 and 4 */
title "Modeling BY groups 3 and 4 only shows that they do not both have S=1";
proc loess data=Melanoma2;
ods select FitSummary;
where Group>2;
by group;
model Incidences=Year;
run;
A constant response in BY Group=2 causes all subsequent
BY groups to have S=1, even when this is not the correct value
----------------------------- Group=1 ------------------------------
The LOESS Procedure
Selected Smoothing Parameter: 0.607
Dependent Variable: Incidences
Fit Summary
Fit Method kd Tree
Blending Linear
Number of Observations 14
Number of Fitting Points 14
kd Tree Bucket Size 1
Degree of Local Polynomials 1
Smoothing Parameter 0.60714
Points in Local Neighborhood 8
Residual Sum of Squares 0.41878
Trace[L] 3.87019
GCV 0.00408
AICC -1.31137
A constant response in BY Group=2 causes all subsequent
BY groups to have S=1, even when this is not the correct value
----------------------------- Group=2 ------------------------------
The LOESS Procedure
Smoothing Parameter: 1
Dependent Variable: Incidences
Fit Summary
Fit Method kd Tree
Blending Linear
Number of Observations 10
Number of Fitting Points 7
kd Tree Bucket Size 2
Degree of Local Polynomials 1
Smoothing Parameter 1.00000
Points in Local Neighborhood 10
Residual Sum of Squares 1.63689E-29
Trace[L] 2.55120
GCV 2.95017E-31
AICC -66.28127
A constant response in BY Group=2 causes all subsequent
BY groups to have S=1, even when this is not the correct value
----------------------------- Group=3 ------------------------------
The LOESS Procedure
Smoothing Parameter: 1
Dependent Variable: Incidences
Fit Summary
Fit Method kd Tree
Blending Linear
Number of Observations 10
Number of Fitting Points 7
kd Tree Bucket Size 2
Degree of Local Polynomials 1
Smoothing Parameter 1.00000
Points in Local Neighborhood 10
Residual Sum of Squares 0.65130
Trace[L] 2.55120
GCV 0.01174
AICC -0.42788
A constant response in BY Group=2 causes all subsequent
BY groups to have S=1, even when this is not the correct value
----------------------------- Group=4 ------------------------------
The LOESS Procedure
Smoothing Parameter: 1
Dependent Variable: Incidences
Fit Summary
Fit Method kd Tree
Blending Linear
Number of Observations 10
Number of Fitting Points 7
kd Tree Bucket Size 2
Degree of Local Polynomials 1
Smoothing Parameter 1.00000
Points in Local Neighborhood 10
Residual Sum of Squares 0.55771
Trace[L] 2.55120
GCV 0.01005
AICC -0.58301
Modeling BY groups 3 and 4 only shows that they do not both have S=1
----------------------------- Group=3 ------------------------------
The LOESS Procedure
Selected Smoothing Parameter: 0.85
Dependent Variable: Incidences
Fit Summary
Fit Method kd Tree
Blending Linear
Number of Observations 10
Number of Fitting Points 10
kd Tree Bucket Size 1
Degree of Local Polynomials 1
Smoothing Parameter 0.85000
Points in Local Neighborhood 8
Residual Sum of Squares 0.50674
Trace[L] 3.00700
GCV 0.01036
AICC -0.37730
Modeling BY groups 3 and 4 only shows that they do not both have S=1
----------------------------- Group=4 ------------------------------
The LOESS Procedure
Selected Smoothing Parameter: 1
Dependent Variable: Incidences
Fit Summary
Fit Method kd Tree
Blending Linear
Number of Observations 10
Number of Fitting Points 7
kd Tree Bucket Size 2
Degree of Local Polynomials 1
Smoothing Parameter 1.00000
Points in Local Neighborhood 10
Residual Sum of Squares 0.55771
Trace[L] 2.55120
GCV 0.01005
AICC -0.58301
When using a BY statement in PROC LOESS if any BY group has a constant response (and thereby S=1) the smoothing parameter (S) is set to one (1) in the all subsequent BY groups regardless of what the correct value should be.
Type: | Problem Note |
Priority: | alert |
Topic: | Analytics ==> Regression SAS Reference ==> Procedures ==> LOESS
|
Date Modified: | 2014-08-22 14:58:41 |
Date Created: | 2014-08-16 10:46:57 |